Correction of sequence-based artifacts in serial analysis of gene expression

نویسندگان

  • Viatcheslav R. Akmaev
  • Clarence J. Wang
چکیده

MOTIVATION Serial Analysis of Gene Expression (SAGE) is a powerful technology for measuring global gene expression, through rapid generation of large numbers of transcript tags. Beyond their intrinsic value in differential gene expression analysis, SAGE tag collections afford abundant information on the size and shape of the sample transcriptome and can accelerate novel gene discovery. These latter SAGE applications are facilitated by the enhanced method of Long SAGE. A characteristic of sequencing-based methods, such as SAGE and Long SAGE is the unavoidable occurrence of artifact sequences resulting from sequencing errors. By virtue of their low-random incidence, such tag errors have minimal impact on differential expression analysis. However, to fully exploit the value of large SAGE tag datasets, it is desirable to account for and correct tag artifacts. RESULTS We present estimates for occurrences of tag errors, and an efficient error correction algorithm. Error rate estimates are based on a stochastic model that includes the Polymerase chain reaction and sequencing error contributions. The correction algorithm, SAGEScreen, is a multi-step procedure that addresses ditag processing, estimation of empirical error rates from highly abundant tags, grouping of similar-sequence tags and statistical testing of observed counts. We apply SAGEScreen to Long SAGE libraries and compare error rates for several processing scenarios. Results with simulated tag collections indicate that SAGEScreen corrects 78% of recoverable tag errors and reduces the occurrences of singleton tags. AVAILABILITY The SAGEScreen software is available for academic users from the first author.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detection of gene expression and sequence analysis of chicken class II trans activator (CIITA)

BACKGROUND:Class II transactivator (CIITA) is a dominanttranscriptional element, controlling numerous genes in theimmune system. CIITA is expressed in a constitutive pattern inantigen presenting cells although its expression can occur inother cell types. Since the revelation of CIITA, there have beenconsiderable advances toward understanding its role as anactivator of MHC II genes in humans and...

متن کامل

Computed tomography based attenuation correction in PET/CT: Principles, instrumentation, protocols, artifacts and future trends

  The advent of dual-modality PET/CT imaging has revolutionized the practice of clinical oncology, cardiology and neurology by improving lesions localization and the possibility of accurate quantitative analysis. In addition, the use of CT images for CT-based attenuation correction (CTAC) allows to decrease the overall scanning time and to create a noise-free attenuat...

متن کامل

Cloning and Expression Analysis of ZmERD3 Gene From Zea mays

Background: Stresses (such as drought, salt, viruses, and others) seriously affect plant productivity. To cope with these threats, plants express a large number of genes, including several members of ERD (early responsive to dehydration) genes to synthesize and assemble adaptive molecules. But, the function of ERD3 gene hasn’t been known so far.Objectives:</strong...

متن کامل

Cloning and Expression of Immunogenic Regions of EMA-1 Gene of Theileria equi From Infected Horses

Diversity among the pathogenic strains of Theileria equi (T. equi), a major agent of equine piroplasmosis, can affect the appropriate detection of parasite and host immunization. Production of recombinant surface proteins from an infected horse in natural endemic area provides a reliable tool for immunodiagnosis of parasite. Regarding this, the present study was targeted toward the cloning, exp...

متن کامل

CLONING AND EXPRESSION OF LEISHMANOLYSIN GENE FROM LEISHMANIA MAJOR IN PRIMATE CELL LINES

Leishmanolysin is a worldwide disease that is caused by different species of the genus Leishmania. Leishmanolysin, One of the genes expressed by Leishmania, appears to be an ideal candidate for genetic vaccination. In this study, a full length sequence, which encodes Leishmanolysin functionally critical regions (amino acids 100-579), was cloned from a Leishmania strain endemic to Iran. Analysis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 20 8  شماره 

صفحات  -

تاریخ انتشار 2004